THE TRANSMISSION OF INFORMATION-II 


R. M. FANO 


TECHNICAL REPORT NO. 149 

FEBRUARY 6, 1950 


RESEARCH LABORATORY OF* ELECTRONICS 

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 



MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


Research Laboratory of Electronics 


Technical Report No. 149 


February 6, 1950 


THE TRANSMISSION OF INFORMATION - II 
R. M. Fano 


Abstract 

This report vhlch is a continuation of Technical Report No. 65 deals 
with the transmission of Information through a discrete channel disturbed* 
by noise. It considers, in particular, the problem of reducing the equlvo 
cation by coding. The behavior of the per-unit equivocation aB a function 
of the channel equivocation is computed in an approximate manner for a 
number of coding schemes of increasing complexity. The results indicate 
that the limiting condition of vanishing per-unit equivocation is 
approached rather slowly with increasing code complexity. 



THE TRANSMISSION OF INFORMATION - II 


The basic theory underlying the process of transmitting information by 
means of discrete signals has been presented in Technical Report No, 65. 

The transmitter was considered as a device able to indicate successively, 
by means of a conventional signal, one choice out of a finite group of 
possible choices. It was assumed that the received signal was uniquely 
related to the transmitted signal; in other words, that the receiver could 
detect correctly at all times the choices indicated by the transmitter. It 
is hardly necessary to say that this ideal situation does not belong to the 
physical world. In practice the choices indicated to the receiver by the 
signal are not necessarily the same as those which the transmitter Intended 
to indicate, and are not uniquely related to them. This situation arises 
because of the presence of some random process which modifies the signal in 
an unpredictable manner. Ve shall refer to any such random process as 
"noise” regardless of its exact physical nature. This report is devoted to 
the study of the process of communication by means of discrete signals in 
the presence of noise. 

The first three sections of the report are almost identical with the 
second chapter of the notes written in the Spring of 19*9 to accompany a 
series of lectures given by the author for the staff of the Raytheon Manu- 
facturing Company. The material covered in these sections does not extend 
beyond the theory developed by N. Wiener (1) and C. Shannon (2). The last 
section presents some recent work done by the author on the noise-reduction 
properties of coding. 

1. Equivocation and net information transmitted 

Our first task will be to compute the average amount of information 
received in the presence of noise when the statistical characteristics of 
both signal and noise are known. We shall limit our analysis at first to 
the case of Independent selections which are also affected independently by 
noise. Let x be a discrete random variable representing the choice 
selected at the transmitter and y be another discrete random variable 
representing the corresponding choice indicated to the receiver by the 
signal. The same number N of choices are available at both the transmitter 
and the receiver. The statistical nature of the signal generated by the 
transmitter is given as usual in terms of a set of N probabilities p(x). 

The average amount of information generated at the transmitter is then (3) 

H, --X P(x) log 2 p(x) . (1) 

X 


- 1 - 



This summation Is extended, of course, to the N values of x representing 
the N choices, although this Is not Indicated in detail in order to sim- 
plify the notations. 

The effect of noise on the received signal can be represented conven- 
iently by a set of N 2 conditional probabilities p(x;y). More precisely 
p(x;y) is the probability that choice y is received when choice x 1 b trans- 
mitted. It is convenient to define three other sets of probabilities 
related to the two Just defined. The joint probability that x is trans- 
mitted and y is received is given by 

p(*y) - p(*) p(*;y) . ( ? ) 

The probability p(y) that y is received is given by 


p(y) 


p(*y) - pU) p(*;y) 


(3) 


Finally, the conditional probability that x was transmitted when y is 
received is given by 

p(y;x) . . ■ PW . (*) 

y p(x) pUjy) 


Suppose that a choice y 1 b indicated by the received signal. How much 
information is conveyed, on the average, by this Indication? The difficulty 
in answering this question lies in the fact that the reception of y does 
not provide any definite knowledge about x; that is, about the choice indi- 
cated by the transmitter. In fact, any one of the choices could be the one 
actually selected. On the other hand, the reception of y obviously provides 
some knowledge about x, at least in a probability sense. 

Consider, for instance, a signal consisting of pulses and empty spaces, 
a priori equally likely to occur. Suppose that, because of noise, a pulse 
has a probability a of being received as an empty apace, and an empty space 
has the same probability a of being received as a pulse. If a pulse is 
received we know that the probability that a pulse was actually transmitted 
is 1 — a, while a Is the probability that the pulse received was originally 
an empty space. It Is clear that if a < 0 . 5 , some knowledge about the 
transmitted signal has been provided by the reception of the pulse, even if 
we still do not know whether a pulse or an empty space was transmitted. 

The situation can be stated in more general terms. The a priori prob- 
ability that x will be selected is p(x). After y has been received the a 
posteriori probability that x was selected is p(y;x) as given by Eq. 4. 


- 2 - 



The problem la thus reduced to the evaluation of the Information provided 
by such a change of probability. To solve this problem ve observe first 
that the average amount of information that would be conveyed by the 
correct Indication of x Is equal to H^, as given by Eq. 1, when the p(x) 
alone are known. If, on the contrary, the p(y;x) (corresponding to a par- 
ticular choice y Indicated by the signal to the receiver) were also known, 
the average amount of Information that would be conveyed by the correot 
Indication of x becomes 

H x (y) - - y " 1 p(y;x) log 2 p(y;x) . (5) 

x 

It follows that the additional Information corresponding to the knowledge 
of the N probabilities p(y;x), as a result of the reception of y, must be, 
on the average, 

H R (y) - H x - H^y) . (6) 

To obtain the overall average amount of Information per Indication received, 
we must average H R (y) with respect to y. The final result Is then 

H„ - Y, f<’> H n<r) > - H,.„ (7) 

y 

where 

H y;x " “zL, lo e 2 p(y;*) . (8) 

y x 

The quantity H represents the additional average amount of Information 

y;* 

required for the determination of x when y Is known or. In other words, the 
average amount of Information lost because of noise. The name "equivoca- 
tion” has been given by Shannon to this quantity to stress the fact that it 
is a measure of the ambiguity of the received signal. 

The Idea involved In the above evaluation of H R can be made clearer by 
means of the following physical Illustration. Consider a system consisting 
of a transmitter and a receiver which can communicate through a noiseless 
channel as well as through a noisy channel. Suppose that a long message haB 
just been transmitted through the noisy channel, and suppose that the know- 
ledge of the message received has somehow been fed back to the transmitter. 
Then consider separately all the Instances in which a particular choice y 
has been received together with the corresponding choices x transmitted. 

If the noiseless channel is to be used next to provide the receiver 
with a definite knowledge of these x, how many binary selections are 


- 3 - 



required, on the average (per N order selection) for this purpose? Since 
both receiver and transmitter know the set of probabilities p(yjx) speci- 
fied by the y received, a new code can be used designed to be optimum for 
this set of probabilities. The number of binary selections that must be 
transmitted on the average for a given y la clearly just equal to H x (y) as 
given by Eq. 5. Therefore H x (y) is the additional Information that must be 
transmitted to provide a definite knowledge of x. Of course, the code to 
be used depends on y and the specification of the code Is just the Informa- 
tion provided by the reception of y. Averaging the number of binary selec- 
tions required for all possible codes yields the value U < given by Eq. 8. 
If the Indication of x had been transmitted originally through the noiseless 
channel, and a code optimum for the set of probabilities p(x) had been used, 
the average number of binary selections required would have been equal to 
H^. In conclusion, the reception of y Is equivalent, on the average, to a 
number of binary selections (units of Information) equal to as given by 
Eq. 7. 

The soundness of the above analysis from a physical point of view is 
considerably impaired by the assumption that the transmitter has a knowledge 
of the message received. This assumption Is at the basis of the whole 
physical procedure, since a more efficient code can be used only if trans- 
mitter and receiver are In a position to agree on It. It does not seem 
possible to eliminate this assumption while still maintaining arbitrariness 
of the message. In other words, whenever random errors occur in the trans- 
mission of a message, the number of binary selections required, on the 
average, to correct such errors cannot be made smaller than the amount of 
Information conveyed by the message, if the knowledge of the errors 
occurred is not available to the transmitter. Similarly any experienced 
teacher knows that efficient teaching requires some sort of feedback, as 
provided, for Instance, by questions from the students or by the correction 
of quizzes and homework. As a matter of fact, a good teacher, after finding 
out what is in the minds of his students, again covers the whole subject 
that has not been clearly understood, devoting to each part of its explana- 
tion a time roughly proportional to how far the students are from correct 
understanding. This process Is very similar to making the number of binary 
selections In the second transmission proportional to the logarithm of the 
reciprocal of the probability p(y;x), as judged by the receiver on the basis 
of the first transmission. It will be shown later how the need for feedback 
can be eliminated by performing the transmission In such a way as to make 
the probability of the occurrence of errors approach zero. It will be 
possible to achieve this result by transmitting only certain types of 
messages, rather than arbitrary messages. Roughly speaking, any uncertainty 


- 4 - 



on the part of the receiver will he avoided by transmitting only as much 
information as can be received correctly through the given noisy channel. 
Before proceeding to the discussion of this question ve wish to interpret 
from other points of view the results already obtained, in order to throw 
additional light on the problem. 

Suppose for a moment that the noise is considered as useful informa- 
tion; that it is actually another signal, and that ve wish to receive both 
signals. The average amount of information per pair of selections x and y 
(by which the noise is completely determined) can be expressed in the three 
equivalent forms 


where 


and H 


y;* 


*y 


H + 
x 


H. 


*;y 


are given by Eqs. 


- H y + H 
5 and 8, 


y;x 

and 


(9) 



y~' i p(*y) iog 2 p(*y) 


x y 


do) 


H x .y - - pU) p(*;y) log 2 p(x;y) (11) 

X y 

H y - - p(y) log 2 p(y) . (12) 

y 


The eoulvalence of the three expressions in Eq. 9 is evident from a physical 
point of view, and can be proved without difficulty by substituting Eqs. 5, 
8, 10, 11, 12 in Eq. 9. Considering again the noise as an undealred dis- 
turbance, the information received through a noisy channel, given by Eq. 7, 
can now be expressed in two other equivalent forms by means of Eq. 9. 


H D - H -H - H -H 
R * y;* y x;y 


H + H - H 
x y xy 


( 1 ?) 


Ve have already seen that the first form can be Interpreted as the total 
information desired minus the additional information that must be trans- 
mitted after y has been received. The second form can be Interpreted as 
the information corresponding to the reception of y if the channel were 
noiseless, minus that part of this Information which is actually Just noise. 
Finally, the third expression can be Interpreted as the information corre- 
sponding to the reception of y if the channel were noiseless plus the 
information conveyed by x if y were not known, minus the information corre- 
sponding to the pair xy. In other words, Hp is the information common to 
x and y. 

Further light can be thrown on these interpretations by considering 


- 5 - 



the l imi ting case of sequences of selections of length n approaching infin- 
ity. It is shown in Technical Report Ho. 65 , Sect. 3 that, out of the 
ensemble of all possible sequences, we can form a group of sequences whose 
total probability approaches unity when n approaches Infinity. The fre- 
quencies of occurrence of the different choices In the sequences of this 
group differ from the probabilities of the choices by amounts that also 
approach zero when n approaches infinity. The number of sequences In these 
groups Is then equal to the reciprocal of the probability of each sequence, 
that Is to 


H-l 


3 x (n) - | jp(x) 


-np(x) 


elH 


. 2 


x-o 


( 1 *) 


The same Is true, of course. If we write y Instead of x. Similarly, the 
number of sequences of errors produced by noise belonging to the group 
whose total probability approaches unity when n approaches Infinity Is 

3 x;y (n ) " t*** 7 . (15) 

This Is, of course, the number of different sequences of y-selectlons that 
have probability approaching unity of originating from a given sequence of 
x-selectlons . The corresponding number of sequences of x-selectlons that 
have a probability approaching unity of having caused a given sequence of 
y-selectlons is 

nU 

S y ;x (n) .2 y;x . (16) 

The significance of the groups of sequences defined above Is Indicated 
schematically In Fig. 1. It Is clear that, when n Is very large, ve can 
neglect In our reasoning all sequences not belonging to the above groups. 

In fact, the number of binary selections required on the average to transmit 
or to correct the transmission of the sequences not belonging to these 
groups is negligible compared to the number required for the others, 
regardless of the code used. On this basis ve can rewrite Eq. 7 is the form 

1 3 x< n ) 

®R " n log 2 & In) (l7) 

y , a 


and interpret It as 1/n of the amount of information provided by the speci- 
fication of which group of 3 (n) sequences of x-selectlons contains the 

y , x 

transmitted sequence. The second expression for In Eq. 13 can be Inter- 
preted as 1/n of the Information provided by the reception of a y-sequence 

when It Is known that 3 (n) different y- sequences can be produced by the 

x;y 

same x-sequenoe. The third expression for Is a measure of the dependence 


- 6 - 



of the y-eequences on the x-sequencee. It Is 1/n of the difference between 
the number of binary selections required to specify both an x-sequence and a 
y-sequence If they were Independent, and the number required to specify one 



TRANSMITTED 

SEQUENCES 


RECEIVED 

SEQUENCES 


Jig. 1. Illustration of the limiting case of transmission through noise. 

of the possible pairs of x and y-sequences. If there were no noise we would 
have 

H - H - H (lB) 

x y xy ' ' 


because a y-sequence would be completely specified by the corresponding x- 
sequence. 

The above analysis was limited to the case of sequences of Independent 
selections. The corresponding results for the general case of non-independ- 
ent selections would have more complex mathematical forms, but their signif- 
icance would not change materially. Equations 9 and 13 are still valid in 

the general case, provided the proper expressions are used for H , H , 

jt y 

H , H and H The amounts of Information represented by H , H and 
y ; a xjy xy * a y 

H are never Increased by a correlation between successive x-selections, 

7 ;x * 

as Bhown In 3ect. 5 of Technical Report No. 65. It will be remembered that 

U represents the Information conveyed by the knowledge of x, H represents 
x y 

the Information (about y) conveyed by the knowledge of y, and H . repre- 

y; x 

sents the Information conveyed by the knowledge of x when y Is already 

known. E represents the Information (about y) conveyed by the knowledge 
a ; y 

of y when x Is already known, and thus depends only on the noise and on the 
first probabilities p(x). It follows that H^, the Information about x 
resulting from the knowledge of y. Is never Increased by any correlation 


- 7 - 



between successive x- selections, as Indicated by Eq. 13. This general 
result Is Important In the following determination of the oapaclty of a 
given channel. 


2, Channel capacity 

The preceding section has been devoted to the evaluation of the infor- 
mation received In the presence of noise, when the statistical characters 
of the signal transmitted as well as of noise are specified. Ve now wish 
to determine the max Ian in rate at which information can be received through 
a channel able to handle a specified number N of choices and disturbed by a 
noise whose statistical character is also specified. In other words, we 
wish to determine the capaolty of a channel having specified character- 
istics. Mathematically speaking, this problem amounts to the maximisation 
of the quantity H R with respect to the p(x), subject to the constraint 

Y " 1 • (19) 

x 

The p(xjy), which represent the characteristics of the noise, are held 
constant In this maximisation process. The x-seleotlons may be assumed to 
be Independent, beoause, as shown above, cannot be Increased by any 
correlation between them. Using Lagrange's method, ve equate to zero the 
partial derivatives of the function 


b r + x E p(x) - E y- H *;y + • (20) 

X X 

The constant X is the indeterminate multiplier. Taking the partial deriva- 
tives of Eq. 20 with respect to the p(x) we obtain a set of equations, one 
per value of x, of the form 


P(x;y)[ 

y 


1 + In 



y 


These equations are more conveniently vrltten In the form 


‘( 21 ) 


- Y; p(*;y) l 0 s 2 p(y) - - c + H y (x) (22) 

y 

where 

c ■ - x + Tn~? y~! ■ tt - ? - x 

y 


- 8 - 



and 


H y U) - - ^ P(*jy)l°B 2 p(xjy) 
y 

Ve observe first that C Is the capacity of the channel, that Is, the maximum 
value of Hp, By multiplying each term of Eq, 22 by p(x) and adding for all 
values of x ve obtain 


- Y' p(x) p(xjy)log 2 p(y) p(x) H y (x) - H y - H Xjy - C 

x y x (24) 

To determine the set of p(x) that yields the maximum value of H^, that is, 

C, It is convenient to vrlte the set of Eqa. 22 In matrix form as follows 

P(x;y) • log 2 p(y)j - - (c + H y (x)) (25) 

[p(*»y)] Is a square matrix representing the noise characteristics. In which 
the terms of an; row correspond to the same x and the term of any column 
correspond to the same y. Solving this matrix equation for the column 
matrix log 2 p(y)] yields 

log 2 p(y)[ - - h(x;y)j • (c + H y (x)) (26) 


h(x; 


y) 


Is another square matrix in which the elements of each column are 


p(x;y)J - 

k(x;y)J - 

where, for Instance, 

h<0 ’ :L, " p(6;6J p(l7l? ( - plajlJ p<1; 

Ve can now express p(y) In the form 


the corresponding row In p(x;y) 

divided 

P(x;y)| . 

In matrix notations 


- fp(x 

;y) -1 . 


(27) 

e only two values, 0 and 1, 


p( 0; 0) 

P(0;1) 



p(i;0) 

p(i;l)J 



h(0;0) 

h(l,0)l 



h(0, 1) 

b(l,l) 


(29) 


VJ 


(50) 


- 


c + H y (x) 


p(y) 


(51) 


- 9 - 



In order to express p(x) as a function of the p(y) and the p(x$y), ve 
observe that 


p(y) - p(*) p(*;y) 


( 32 ) 


or, in matrix notations, 

p(y) 


p(*;y) 


p(*) 


(33) 


j u -I J 

where p(x;y) t la obtained from ^p(x;y)J by Interchanging rows with 


columns. Ve have then 


p(x) - h(x;y)j fc ♦ p(y) 


(3*) 


or 


p(*) - h ( x *y) p(y) 
y 

Finally, the p(x) corresponding to the maximum value of Up, that is, 

are given by v — ' r -i 

^ ~ / h(x;y) [c + Hy(x)J 

p(*) - ^ h(x,y) 2 x 

y 


(35) 
to C, 

(36) 


On the other hand, the value of the capacity C must satisfy Ea. 19 or the 
equivalent constraint 


^2 p(y > “ 1 ♦ ( 37 ) 

y 

Substituting Eq. J2 for the p(y) yields 

- ^ h(x;y)[c + H y (x)] 

2 * - 1 (38) 

y 

from which C can be determined, all other quantltlves in this equation being 
known. 

Because of the fact that Eq. J>8 is a transcendental equation in the 
unknown C, it is difficult to visualize how C depends on the noise. Fortu- 
nately, all difficulties disappear if H y (x) is independent of x, that is, 
roughly speaking, if the noise has the same disturbing effect on all x. 

This happens, for instance, in the case of binary pulse transmission if the 
probability of a pulse being received as an empty space is equal to the 


- 10 - 



probability of an empty space of being received as a pulse. In any such 
oase 


H 


*;y 


7 P(*) H y (x) 


(?9) 


Is Independent of the p(x); that le H depends only on the nolee and not 

* 9 J 

at all on the signal. It follows then from 


H R - “y - H x;y (,0 > 

that the maximum value of H_ Is obtained when H le a maximum, that le, when 

k y 

all choices y are a priori equally likely to be received. If the number of 
choices Is H ve have then for the channel capacity 

C - log 2 If - H x;y (41) 

and for the values of the p(x) 


p(x) - f ^ bU;y) 
y 


(42) 


Considering as an example the case of binary pulse transmission mentioned 
above, and letting a be the probability of an empty space being received as 
a pulse and vice versa, ve obtain 


C 


1 + 


a log 2 a + 


(1 - a)log 2 


(1 - a 


>] 


(4?) 


and 


The p(x) are 


p(*;y) 


h(xjy) 



p(0) - p(l) - | 


(44) 

(45) 

(46) 


as expected, because of the symmetry of the problem. In general, the p(x) 
are different, even when the H y (x) are equal, unless the sets of p(xjy) for 
any given x consist of the same set of numbers permutated vlth respect to y. 

For a noise corresponding to a probability of error a - 0.01, the 
channel capacity becomes approximately 0.9, that Is 90 percent of Its value 
In the absence of noise. It Is Interesting to note. In this regard, that 


- 11 - 



the probability of error Is far from being a direct measure of the lose of 
information. In the most general case of arbitrary H y (x) the order of 
magnitude of the channel capacity can be quickly evaluated by noting that 


log_ H - H (x) < C < log- N - H (x) 


where j^Hy(x) a ^ n is the smallest Hy(x) of the set, and 

K W 1 . - » Z H y U) • 


( 47 ) 

(46) 


3. Ideal transmission and the reduction of the equivocation. 


The discussion presented in the preceding sections has led to the 
determination of a theoretical limit for the amount of information per 
selection that can be received in the presence of a given noise. This limit 
on the rate of transmission of information, however, does not provide a 
complete solution to the problem of interest, even from a theoretical point 
of view. In fact our problem is not simply to receive as much information 
as possible, but also to receive as ouch of the information transmitted ae 
possible. In other words, in addition to knowing the upper limit for the 
information received, H R , we would also like to know the lower limit of the 
ratio H /E , which represents the per-unit equivocation or per-unit 

y , a x 

information loot. 

In order to clarify this point, suppose we wished to transmit inde- 
pendent selections, as in the case of a telegraphic message consisting of 
numbers rather than words. As long ae H R is different from zero, we could 
theoretically transmit correctly all the information desired using a minimum 
number of binary selections. This could be done by repeating over and over 
the same message, using each time a more efficient code ae indicated in 
Sect. 2. The main difficulty arising in such a procedure, ae pointed out 
also in Sect. 1, is that after each transmission the message received would 
have to be retransmitted back so that a more efficient code could be used in 
the successive transmissions. This method of transmission is likely to be 
impractical. What we would like to do Instead is to perform only one trans- 
mission, using as many binary selections as necessary to Insure, for any 
practical purposes, the correct reception of the message. It is clear that 
this can be done when the noise la sufficiently low, as in ordinary tele- 
graph transmission. We would like to know whether or not this can be done 
in general as long as H R is different from zero. In other words, we would 
like to know whether information can be transmitted in such a way as to 
decrease the equivocation produced by noise and, if so, what is the lower 


- 12 - 



theoretical limit for the equivocation. It vlll be shovn below that the 
answer to this question Is yes and that the theoretical limit for the 
equivocation is zero. In other words, the probability of Incorrect recep- 
tion of a message can be made as email as desired; and, the number of binary 
selections used In the transmission does not need to be larger, on the aver- 
age, than the information content of the message divided by H^. The price 
for this reduction of the equivocation Is paid In terms of a more Involved 
coding of the Information, and a correspondingly longer delay. 

The manner in which such a result can be achieved, or at least 
approached. Is suggested by what people often do instinctively when communi- 
cating by means of speech or visual indications . Suppose two persons must 
exchange eome Information vhlle In a place so noisy that normal conversation 
is impossible, and that they cannot see each other so that any visual form 
of communication Is precluded. What Is often done In such a case is to 
decide ahead of time that only a certain very restricted group of words will 
be used, the words being selected In such a way that they have the least 
chance of being confused with one another. More specifically, suppose a 
man in a factory must indicate to another man whom he cannot see when to 
start or stop a machine. He certainly vlll not use such words as start and 
stop but rather go and stop or any two other conventional sounds that differ 
considerably from each other. The man who receives the information would 
not, in general, be able to recognize the wordB spoken if he had to differ- 
entiate between all possible words instead of the restricted number agreed 
upon. In other words, the possibility of his receiving correctly the 
information transmitted is based on his ability to decide that the sound 
heard is more similar to one word of the group agreed upon than to any 
other word of the same group. The larger is the noise, the more different 
the words of the group must be and, therefore, the smaller the number of 
words in the group. 

Let us adapt the idea involved in this human communication process to 
the case of communication by means of sequences of selections. It is clear 
that we must select out of the ensemble of all possible sequences of a 
given length a group of sequences which, when transmitted, have the least 
probability of being confused with one another. More precisely, suppose we 
select M sequences out of the ensemble of 3 m N 11 possible sequences of n N- 
order selections and we consider these M sequences as forming a new ensemble 
of choices from which to select. Let us indicate with £ the transmitted 
sequence of this ensemble and with q the received sequence. It must be 
noted that, vhlle the transmitted sequence belongs to the group of M 
selected, the received sequence may be any one of the 3 possible sequences 
of the same length. Each sequence £ has a probability p(£) of being 


- 1 ?- 



transmitted and a conditional probability p(£;q) of being transformed Into 
a particular sequence q by the noise. From these two sets of probabilities 
ve can obtain the conditional probability 


p(q;£ ) 


p(£ ) p(£ ; n) 

£ 


that sequence £ was transmitted when sequence q is received. In terms of 
these probabilities, the average amount of information transmitted per 
sequence Is 

- P(£) log 2 p(£) . (50; 

£ 

The equivocation, that Is, the amount of Information lost. Is 

i°g 2 p(h;£) (5i! 

£ T 


where 


p(q) - ^ p(£ ) p(£;q) 
£ 


The Improvement obtained by limiting the number of sequences that are trans- 
mitted le measured by the decrease of the per-unit equivocation E - H f /H». 

nit £ 

Of course E becomes equal to H _ /H when all possible sequences are trans- 

y;x x 

mltted. It will be shown below that. In all cases, the per-unit equivoca- 
tion can be made as small as desired by making the number n of N-order 
selections In each sequence sufficiently large. At the same time the 
amount of Information actually transmitted can be made to approach In the 
limit the value nC where C Is the capacity of the channel, determined In 
Sect, 2. In other words. It Is possible In theory to design a perfect 
communication system In which, while using the full capacity of a noisy 
channel, all the information transmitted Is received correctly; that Is, 
the probability of Incorrect reception 1s zero. This fundamental theorem 
was first stated and proved by Shannon (2). 

It was pointed out In Sect, 2 that the problem of transmission through 
noise becomes much easier to handle In the limiting case when the length 
n of the sequences approaches Infinity. In this case. In fact, the en- 
semble of sequences that may be formed at the transmitter can be divided 
In two groups, whose total probabilities of occurrence are one and zero 


- 14 - 



respectively. The group with probability one contains (see Eq. 14) 

nH 

S x (n) - 2 * 

sequences in all of which the different choices appear with frequencies 
equal to their probabilities, and which, therefore are equally probable. 

The same is true for the received sequences, in which case the group with 
unity probability contains 

nH 

S y (n) - 2 7 

equally probable sequences. Similarly, each transmitted sequence (belonging 
to the probable group) has unity probability of being transformed by noise 
into any one of a group of 

nH 

S (n) - 2 * 

*;y 

sequences which, in turn, are equally probable (see Eq. 15). Conversely 
when a particular sequence (belonging to the probable group) is received, 
the corresponding sequence transmitted belongs, with unity probability, to 
a group of 

nH, 


y;* 


y;x 


seauences which are also equally probable (see Eq. 16). This situation is 
summarized schematically in Pig, 1 in which each point represents an 
equally probable sequence and each line an equally probable transformation 
caused by noise. 

Figure 1 indicates clearly that if the transmitted sequence is allowed 
to be any one of the 3 x (n) sequences belonging to the probable group, all 
that the receiver will be able to determine is a group of 3 t (n) sequences 

y;* 

to which the transmitted sequence must belong. Suppose, however, that only 
a certain restricted group of M sequences are allowed to be transmitted. 

It might then be possible to select the sequences of thlB group in such a 
way that of the group of S (n) sequences that the receiver determines only 

y i * 

one sequence belongs to the restricted group. In this oase no uncertainty 

would exist as to the sequence transmitted. This is the same as requiring 

that no two sequences of the M selected can be transformed by noise in the 

same sequence. Since each transmitted sequence may be transformed into 

3 (n) sequences, and since there are S (n) different sequences that may 

^>y y 

be received, it must be 


nH T 


S y (n)/S xjy (n) - 3*(n)/S y;x (n) - 2 R 


(53) 


- 15 - 



It vlll be shown below that the ratio 

nH 

M/2 R 

oan be Bade as close to unity as desired. Unfortunately no direct way has 
been found to select the M-sequences In the optimum manner. All we can do, 
therefore. Is to show that such an optimum selection oust exist. This proof 
will be carried out by averaging (over all possible groups of M-sequences 
that can be formed) the probability that when a particular sequence Is 
reoelved there Is an uncertainty as to the sequence transmitted. If this 
average probability Is equal to p there exists at least one group of M- 
sequences for which the probability Is smaller than p. We shall find that 
p can be made arbitrarily small as long as M Is smaller than 

nH„ 


We have seen that the reception of a particular sequenoe specifies a 

group of 3 . (n) sequences to which the transmitted sequence must belong, 
yj* 

Let us consider then all the possible groups of M- sequences that could 
have been formed and compute the probability p that. If the group used was 
seleoted at random. It contains at least one sequence common to the group 
of S ' (n) sequences In addition to the one actually transmitted. Since 
one sequence of the M Is specified, the remaining M — 1 may be any of the 
other 3 x (n) — 1 sequences available at the transsiltter. Thus the proba- 
bility that a particular sequenoe Is one of these M — 1 Is 


~ _ M - 1 

P 1 5 x (n) - 1 


(54) 


This Is the same as the probability of drawing a black ball out of an urn 
containing S x (n.) — 1 balls of which M — 1 are black. The probability for 
two given sequences to be either one or both present among the M — 1 Is 



It follows that 


Po < 2Pi 


(M - 1)(M - 2) 

1~(W= I)(V*J -5) • 

(55) 


(56) 


The reason for this Inequality Is that 2p^ contains twice the probability 
that both sequences will be present simultaneously. For the same reason we 
obtain In general 


P m < m Pi 


(57) 


- 16 - 



where p_ Is the probability that any one or all of m given sequences are 

ul 

present among the N — 1. Therefore, In our particular case, the probability 
p that one or more of the sequences belonging to the group of (S (a) — l) 


will be present 


song the M — 1 sequences must be 

3 (n) 


P < 3 




M - 1 
- 1 


< M 




(58) 


Finally, using Sqs. 14, 16 and 17 ve obtain 


-n 

p < M 2 




(59) 


It will be recalled that the M sequences forming the selected group 
are equally likely to be transmitted so that any one of them would convey 
an amount of Information equal to log^M. Since each sequence consists of 
n selections from the original K choices the amount of Information per 
choice Is 


% - 5 l0 *2 ■ 

The Inequality of Eq, 59 can then be written In the form 

-“<W 


P < 2 


(61) 


This equation shows that p can be made as ssiall as desired by Increasing n 
(the length of the sequences) as long as la smaller than H R . On the 
other hand p is the average (over all possible groups of M- sequences) 
probability that when a particular sequence is received a doubt will exist 
as to the sequence transmitted. Therefore, there must be at least one 
group of M-sequences for which this probability can be made as small as 
desired, provided < H R . Using the symbolism Introduced earlier In this 
section we have, when n approaches Infinity and < % 


p U ) 



H£ « n H^p 


( 62 ) 


p(t; 


l 

o 



(65) 


Clearly, the per-unit eoulvooatlon vanishes when n approaches Infinity 
provided (H r — Up) > c where c Is an arbitrarily small positive constant. 
The maximum value that Hp can assume Is the capacity C of the channel. 
Therefore, It Is possible In theory to design a communication system whloh 
transmits Information at a rate as close as desired to the oapaclty of the 
channel available, with a vanishingly small per-unit equivocation. 

The situation can be summarized as follows. The statistical 


- 17 - 



characteristics of the channel noise specif; a set of transmitter choice 
probabilities p(x), for which takes its maximum value C. If the number 
n of selections is sufficiently large, the problem reduces to that of ^(n) 
equally probable £ -sequences, each of which may be transformed by noise 
into any one of a group of 3 (n) tj- sequences with equal probabilities, 

* j y n r» f 

It is possible then to select approximately 2 £ -sequences out of the S^in) 

in such a way that the corresponding groups of 3 (n) q- sequences do not 

n r;*»y 

have any sequences in common. If only these 2 sequences are allowed to 
be transmitted, no uncertainty will exist at the receiver as to the sequence 
transmitted each time. By coding information in these sequences in such a 
way as to make them equally likely to be transmitted, the amount of infor- 
mation transmitted and received per sequence can be made as close as desired 
to nC. 


4. Coding for reduction of the per-unit equivocation. 

It is hardly necessary to say that the ideal performance discussed 
above cannot be obtained in practice because it would require an extremely 
complex coding operation. The messages to be transmitted would have to be 
coded before transmission into very long (theoretically infinite) sequences 
of selections which in turn would have to be decoded after reception. The 
implications of this requirement go beyond the obvious practical considera- 
tions of equipment complexity. In fact no message or part of a message can 
be known precisely at the receiver until a whole sequence has been received 
and decoded. ThlB does not mean that information is not received all the 
time, but rather that the information being received specifies more and 
more closely all the intelligence coded in the sequence. For instance, the 
noise disturbance may be such that a considerable amount of uncertainty 
concerning the first few selections of a sequence may exist until the last 
few selections are received. If we assume that the time required by the 
coding, transmission and decoding of a sequence is proportional to the 
length of the sequence itself, ve arrive at the conclusion that any decrease 
of per-unit equivocation costs an additional time delay in the overall 
transmission process. In the limit, zero equivocation requires the use of 
infinitely long sequences which in turn implies an infinite time delay. It 
is natural to ask in such a situation what is the minimum per-unit equivo- 
cation that can be obtained with a given channel for specified length of 
sequences and specified rate of transmission of information. In symbols, 
what is the minimum value of E that can be obtained for given n, H^/n and 
channel characteristics? Unfortunately this question has not yet been 
answered, although it is reasonable to expect that the larger the difference 
C — H^/n, the smaller n needs to be for a given per-unit equivocation. 


- 18 - 



Another Important consideration can be made regarding the performance 
of the communication system discussed above. Suppose that n has been made 
sufficiently large so that the per-unit equivocation Is negligibly small 
for (H^/n — C 0 ) < c, where € Is a very small number. Then consider the 
behavior of the per-unit equivocation when the capacity of the channel C Is 
varied by changing the noise. The per-unit equivocation le negligible for 
any value of C larger than C Q , but when C becomes smaller than C q , the per- 
unit equivocation must be 

E > 1 — (64) 

o 

since C le by definition the maximum rate at which Information can be 
received. In other words, the slope of the curve of the per-unit equivoca- 
tion as a function of C/C^ Is discontinuous at the point C/C Q - 1. This Is 
essentially the threshold phenomenon common to all nolse-reduclng systems 
such as frequency modulation, pulse code modulation, etc. The behavior of 
the per-unit equivocation as a function of H Is Illustrated In Pig. 2 for 



Hy ; X 

Fig. 2. The reduction of per-unit equivocation In two simple cases. 


two typical coding schemes. The system considered employs binary equally 
likely selections and the noise Is assumed to be such that either choice has 
a probability p of not being transformed Into the other choice by noise and, 
of course, q - 1 — p of being transformed. In the oase of n ■ 2 the two 
sequences 00 and 11 constitute an optimum set of M ■ 2 equally probable £ - 
sequences In which case - 1. One obtains for H^, and 


H 


- - ^ p( T l) log 2 p(t|) - - 


2pq log- 2pq + (1 - 2pq) log. 


(1 - 2pq) - lL 
(65) J 


- 19 - 



E t;r\ * 2H x;y " “ 2 <P lo «2 P + * lo &2 *> 

H n;c " 1 ■ (H n " H e;n } ‘ 


( 66 ) 

(67) 


In the case of n - 3 the sequences 000, Oil, 110, 101 constitute an optimum 
eet of M equally probable ^-sequences, for which « 2. We obtain for H^, 
Hf „ and H 


H ■ - ](q ? + 3p 2 q) log 5 (p 5 + 3p 2 q) + 


1 - (q^ + ?p Z q) 


log. 


1 - (q ? 


+ ?p 2 q 


- 2 


H e;q " 5H *;y “ " ?(p log 2 p + q log 2 q) 


V " 8 “ (a n ' V 


(68) 

(69) 

(70) 


Figure 2 shows for the two cases considered above the behavior of E, 
the per-unit equivocation, as a function of H , the equivocation for the 

y J * 

individual binary selections which Is a measure of the noise present in the 
system. It will be observed that both curves show the beginning of a 
threshold phenomenon, in the sense that the slope Increases with H , The 

y;* 

reduction of per-unit equivocation is given by the difference between the 
ordinates of the curves. 

The labor involved in considering in the same manner more complex 
coding schemes becomes increasingly prohibitive as the value of n increases. 
On the other hand it would seem worth while to determine, in an approximate 
fashion at least, how fast the per-unit equivocation decreases with increas- 
ing n for a constant rate of transmission of information. Similarly it 
would be very instructive to see how the threshold phenomenon becomes more 
pronounced with increasing n. Unfortunately, the behavior of the per-unit 
equivocation does not seem to be representable by means of a reasonably 
simple mathematical expression, even in an approximate manner. But it has 
been possible to compute a few representative curves for a typical case of 
possible practical importance. 

Figure 3 illustrates the behavior of the per-unit equivocation after 

coding (E ■ H /H,) as a function of the channel equivocation (H ) for 
Mit ? y i * 

binary equally likely selections. Either binary choice has the same proba- 
bility q of being received incorrectly. H - 1 — C is a measure of the 

yjx 

noise present in the channel. The figures on the curves, n » 5, 10, 13, 

20, 67, indicate the length (number of binary selections) of the sequences 
used in the coding scheme. For all curves the rate of transmission H^/n le 
kept constant and equal to 1/3; that is, the transmission takes place at a 


- 20 - 



rate equal to one-fifth of the maximum rate that would be possible In the 
absence of noise. The straight line marked n - oo represents the theoretical 
boundary of the region over which the rate at which information is received 



Fig. Per-unit equivocation as a function of channel equivocation. 

Is smaller than the capacity of the channel, 1/5 (l — E) ^ C. The manner in 
which the curves have been computed is discussed below. 

Let us consider sequences of n binary selections and think of them, 
for convenience, as numbers written in the binary system. Let q be the 
probability that any one digit be transmitted Incorrectly; p m 1 — q is 
then the probability that any one digit be transmitted correctly. The 
probability that a number £ with n digits be received as a number q is 
given by 

m n-m 

P q 

where m is the number of digits that are the Bame in the two numbers. The 
number of q-numbers which have m digits in common with a particular number 

£ Is 


n! 


(n - mj; 


mT 


It follows that when £ is transmitted, the probability that an q is 
received which has In common with £ less than k digits, is 


m-k-1 


V -1 n! _m _n-m n . __n-l _ . n(n - 1) _n-2 _2 . 

" (n - mj! mi Pq -q+nq p + q p + . . . + 

m-o 

+ ~ k) q“' k+1 p*' 1 . <71) 


This expression is easily recognized as a partial sum of the terms of the 


- 21 - 




binomial (q + p) n - 1. This partial sum can be expressed In Integral form 
as 


\ - I q (n - k + l.k) 


q 



O 


(72) 


The definite Integral at the denominator is known as the beta-function and 
1 (r,k) is known as the normalized incomplete beta-function. In our case 

Q 

r - n — k + 1 and r + k - n + 1. The identity of I (r.k) with the partial 

q 

sum of Eq. 71 can be easily checked by Integrating the numerator of Eq. 72 
by parts. An extensive tabulation of I^(r,k) Is available (4) for integral 
values of r and k up to 50, and for values of q In steps of 0,01. Tables 
of q for given values of I^(r,k) and series expressions for computing 
Iq(r,k) are also available In the literature (5,6). 

Returning to our problem, ve certainly can select M£ -numbers such 
that no T)-number will have in common k or more digits with any two of the 
selected numbers. Then, If only these M £ -numbers are used by the trans- 
mitter, the probability of Incorrect reception will be smaller than Q It 
remains to be determined which is the largest value of M for which this 
condition is satisfied. The number of q -numbers which have in common k or 
more digits with a given £ -number is given by 


n-k 


t 1 1. \ ' nl „ n(n — 1) , n(n — l)...(k + 1) 

2 l! (k,r) - ^ Tn - m j . m . - n (n'1 L 

7 m-o (77) 


This number represents a fraction the total number (2 n ) 
of q -numbers. It follows that the largest value of M for which the proba- 
bility of error is smaller than cannot be larger than (ljy2^ k,r ^) 




(k,r) 

2 


(74) 


It does not follow necessarily that Eo. 74 can be satisfied with the equal- 
ity sign, since we are not sure that we can find j^I^y 2 (k,r)] £ -numbers 

which satisfy our requirement. On the other hand since It seems reasonable 
to expect that M can be made very close to [ I ] L /2^ k,r ^] "*> at leaat for 
large values of n, we shall assume for our work that Eq. 74 holds with the 
equality sign. Thus, the amount of Information that may be transmitted by 
means of an n-dlglt binary number with an overall probability of error 
equal to is 


- 22 - 



( 75 ) 


- log 2 M - - log 2 I 1 (k,r) 

? 

It has been pointed out above that the probability of error Q k cannot 
be UBed directly aa a measure of the losa of Information resulting from the 
presence of noise. The quantity to be computed for this purpose Is the per- 
unit equivocation, or In symbols 


E 



- y\(n) io8 2 p^»f) 

a i 

i°g 2 m 


(76) 


where the summation over £ Is limited to the N £ -numbers selected. The 
computation of the numerator of Eq. 76 presents a serious problem. The 
author could not find a way of performing such a computation which would 
avoid the laborious determination of the optimum set of M £ -numbers. A 
satisfactory approximate computation can be carried out, as follows. 

Let us consider the q-numbers as divided Into M - groups , each one 
corresponding to the M £ -numbers that are transmitted with equal probabili- 
ties. When a number q Is received, the receiver Interprets It as the £ 
corresponding to the group to which q belongs. The probability that an 
error results from this procedure is as given by Eq. 72. Let us assume, 
because of lack of more precise knowledge, that, when the £ transmitted is 
not the one corresponding to the group to which q belongs, any other £ is 
just as likely to have been the number transmitted; that Is, let us assume 
that p(q;£ ) Is equal to 1 — for the £ corresponding to the group to which 
q belongs, and is equal to Q^/fM — 1) for any one of the other M — 1 numbers 
£. We are now in a position to express E In terms of known quantities as 


- Y, [ (1 - V l0 «2 (1 - V + < M - 1) «^I lo ®2 1T=~I 


E - 


log 2 M 


(1 - (j^) log g (l logg 

l°g 2 M 

For values of smaller than about 0.1, that is, for values of Q k of 
practical Interest, we have, to a good approximation, 

1 ~ Qjj + In (j- + In (M — 1) 


(77) 


E - 


Sc 


In M 


(78) 


where In Indicates the natural logarithm. 


- 23 - 



The physical significance of the approximations Involved In the above 
computation of E can be Interpreted as follows. Suppose that, after 
receiving a long message, the receiver had sent back to the transmitter the 
numbers received for checking purposes. What fraction of the whole Informa- 
tion content of the message would have to be retransmitted to correct the 
errors < In the first place, the transmitter would have to Indicate where. 

In the message, the errors have occurred. Since the probability of error 
Is C^, the average number of binary digits (per number In the message) 
required to Indicate whether each number is correct or Incorrect Is 

- [Q k log 2 + (1 - Q k ) log 2 (1 - Q k )] . (79) 

After the location of each error has been Indicated, It Is still necessary 
to retransmit the correct number; the numbers to be retransmitted consti- 
tute a fraction Q k of the whole message. Each number may have M — 1 
equally likely values, since the number received previously Is known to be 
Incorrect. Thus, this retransmission corresponds to an amount of informa- 
tion (per number in the original message) equal to 

Q k log 2 (M - 1 ) . ( 80 ) 

Dividing the sum of 79 and 00 by the Information corresponding to each 
number of the original message (log 2 M) yields the value of E given in 
Eq. 77* Thus, since an additional amount of Information eaual to E log 2 M 
(as given In Eq. 77 ) would be sufficient to specify completely the message, 
this same amount must be equal or larger than the Information lost because 
of noise. In other words, the actual value of E must be equal to or 
smaller than the value given by Eq, 77 . The difference between the two 
values of E results from the Information conveyed by the reception of any 
number q, In addition to the knowledge that the number belongs to a particu- 
lar group of q-numbers. This additional Information would result In a 
smaller value of expression 80, because the M — 1 numbers would no longer 
be equally likely to be transmitted. It 1 b clear from Eq. 78 that this 
difference Is significant only when > 1/(M — 1). In addition, the value 
of E given by Eq. 77 Is a better estimate of the amount of information re- 
quired in practice to correct the errors. A very Involved technique would 
be required to perform the correction by means of a number of binary digits 
corresponding to the actual value of E. 

The main results of the computations carried out as Indicated above are 
presented In Fig. 3. Figure 4 illustrates the behavior of the Information 
transmitted per number, H^, as a function of r, the maximum number of Incor- 
rect digits for which the number transmitted is still recognized correctly 


- 24 - 









S/N in db 


FigB. 6 Probability of error and channel equivocation aa functions of 
and 7 signal- to-nolse ratio. 


by tbe receiver. It must be understood that these curves have been com- 
puted allowing fractional as veil as integral values of r and k. In prac- 
tice, of course, only Integral values can be used, so that only discrete 
pointB of the curves correspond to physically realizable systems. Figure 5 
presents a plot of the probability of error Q k as a function of the per- 
unit equivocation E according to Eq. 78. Finally, Fig. 6 relates E and 
to the algnal-to-nolse ratio 3 /R In decibels for an unmodulated pulBe 
system with two equally probable levels. The noise is assumed to be 
gaussian, that 1 b, with an amplitude probability density 


f(x) 



( 81 ) 


where N Is the noise power. The probability of error per binary pulse Is 
given by 


q 


/ 






( 82 ) 


-26- 





where 3 le the average signal power, and the two pulse levels are equal to 
+ /a. The same result Is obtained, of course, for pulse levels equal to 
zero and 2/fT, If the d-c power Is not taken Into account. 

The curves of Fig. 3 are drawn for a constant rate of transmission, 
that Is for a constant value of H^/n - 1/5. The parameter n Is the number 
of digits (pulses) per £ -number. The H axis up to 0.8 together with the 

y t * 

straight line n - oo represent the limiting case of the set of curves when n 

approaches Infinity. The point H - 0.8 corresponds to the largest noise 

y j x 

(q - 0.245, SA - — 3.25 db) for which 0.2 unit of Information per pulse 
may be received; in other words. It corresponds to a channel capacity 
C - 0.2. 

It Is clear that this limitin g behavior Is approached rather slowly 
when n Is Increased. For n - 67 (the largest values for which tables of 
I^(r,k) are available) the curve has a marked threshold behavior but still 
it Is not close to the theoretical limit. 

It did not appear worth while to draw similar curves for larger values 
of H^/n because they would approach the theoretical limit at an even slower 
rate. In fact the reduction of E Is obtained by making use of the faot that 
It Is unlikely that more than a certain fraction of digits are received 
incorrectly. The threshold phenomenon depends on how suddenly increases 
after q reaches a certain value and may be measured roughly by the ratio of 
the mean to the standard deviation of q for the density function dQj^dq. 

This ratio Is given (6) by 

T • q/f^ - (q) 2 - ( r + k + 1) . (83) 

Thus, for a constant value of k + r ■ n + 1, T is proportional to J r/k, 
which, in turn, decreases when k Increases. On the other hand H^/n 
Increases with k - (n + 1 — r) for a given n as shown in Fig. 4. We may 
conclude, therefore, that the threshold phenomenon, characteristic of a re- 
duction of E becomes less pronounced, for a given n, when the value of H^/n 
Increases. It is Interesting to note. In addition, that If we take T as a 
measure of the sharpness of threshold, and we observe that the ratio r/k 
does not change much for a given Hf/n, the sharpness of threshold Is pro- 
portional to /n + 2. 

A last point remains to be discussed In connection with Fig. 3. The 
curve for n « 5 corresponds to the case in which the per-unit equivocation 
Is reduced by simply repeating the same binary digit 5 times. From a pulse 
transmission point of view this repetition Is equivalent to Increasing the 
pulse length by a factor of 5. This, in turn, would permit a 5 to 1 reduc- 
tion of the bandwidth required for transmission and a consequent reduction 


- 27 - 



of the noise pover by the e&me factor. In spite of this apparent equiva- 
lence, the process of repeating 5 times the same pulee indication and the 
process of reducing the bandvldth by a factor of 5 do not lead to the same 
value of per-unit equivocation. The curve marked "1/5 band" in Fig. 3 
which applies to the latter process, was computed by means of the curves of 
Fig. 6. This curve lies considerably below the curve for n - 5 and mostly 
between the curves for n - 10 and n - 15. We must keep in mind, in thle 
regard, that the values of £ computed by means of Eq. 77 are somewhat 
higher than the correct values. Still, we should expect repetition to be 
less effective than band reduction in lowering the per-unit equivocation, 
because filtering is an averaging process which takes into account the 
whole noise wave, while the repetition process in our scheme takes into 

* 

account only how often the noise amplitude is larger than a certain value. 

In conclusion, it appears that slowing down the rate of transmission 
by reducing the bandvldth is the best way to lower the per-unit equivoca- 
tion, £, in the case of a very noisy channel. Larger reductions of E are 
possible, in theory, but they require very complex coding and appreciable 
transmission delay. 


* Two coding schemes which take into account the whole noise wave are 
analyzed in a recent paper ( 7 ) by 3. 0. Rice. 


-28- 



References 


1. N. Wiener: Cybernetics (John Wiley, H.Y. 1948). 

2. C. E. Shannon: A Mathematical Theory of Communication. B.3.T.J. 27 . 

Roe. 5 and 4 (July and Oct. 1948). 

3. R. M. Fano: The Transmission of Information. Technical Report Ro. 65 

Research Laboratory of Electronics, M.I.T. (1949). 

4. K. Pearson: Tables of the Incomplete Beta-Function (The University 

Press, Cambridge, England, 19?4). 

5. C. M. Thompson: Tables of Percentage Points of the Incomplete Beta- 

Function. Blometrlka 151 (Oct, 1941). 

6. U. E. Soper: The Rumerlcal Evaluation of the Incomplete Beta-Function 

Tracts for Computers, Ho. VII (Cambridge University Press, London, 

1921 ). 

7. S. 0. Rice: Communication In the Presence of Rolse-Probablllty of 

Error for Tvo Encoding Schemes. B.3.T.J. 29 . Ro. 1 (Jan. 1950). 


- 29 - 



